Activity 1 - PALMER PENGUINS DATA ANALYTICS
Analyst: Jessie O. Mompero Jr
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import folium
from folium.plugins import HeatMap
DATABASE
In [2]:
chicago_df = pd.read_csv('datasets\\chicago_2001_present.csv')
FILLING UP NULL VALUES
In [3]:
chicago_df['Location Description'] = chicago_df['Location Description'].fillna('unaccounted')
chicago_df['District'] = chicago_df['District'].fillna(chicago_df['District'].mean())
chicago_df['Ward'] = chicago_df['Ward'].fillna('unaccounted')
chicago_df['Community Area'] = chicago_df['Community Area'].fillna('unaccounted')
chicago_df['X Coordinate'] = chicago_df['X Coordinate'].fillna('unaccounted')
chicago_df['Y Coordinate'] = chicago_df['Y Coordinate'].fillna('unaccounted')
chicago_df['Location'] = chicago_df['Location'].fillna('unaccounted')
chicago_df = chicago_df.dropna(subset=['Latitude', 'Longitude'])
chicago_df.isnull().sum()
Out[3]:
ID 0 Case Number 0 Date 0 Block 0 IUCR 0 Primary Type 0 Description 0 Location Description 0 Arrest 0 Domestic 0 Beat 0 District 0 Ward 0 Community Area 0 FBI Code 0 X Coordinate 0 Y Coordinate 0 Year 0 Updated On 0 Latitude 0 Longitude 0 Location 0 dtype: int64
DATA TYPES
In [4]:
chicago_df.dtypes
Out[4]:
ID int64 Case Number object Date object Block object IUCR object Primary Type object Description object Location Description object Arrest bool Domestic bool Beat int64 District float64 Ward object Community Area object FBI Code object X Coordinate object Y Coordinate object Year int64 Updated On object Latitude float64 Longitude float64 Location object dtype: object
Q1 : YEAR 2001 ANALYSIS
In [5]:
chicago_2001 = chicago_df[chicago_df['Year'] == 2001]
loc_counts = chicago_2001['Primary Type'].value_counts().head(10)
plt.figure(figsize=(10,5))
sns.barplot(x=loc_counts.index, y=loc_counts.values, palette='magma')
plt.title('Top 10 Primary Crime in 2001')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Primary Type')
plt.ylabel('Number of Incidents')
plt.show()
Insight No 1
This is the Top 10 Crime in 2001, and theft is clearly number one with nearly 100,000 records, which is surprisingly high. I think widespread poverty and limited economic opportunities pushed many people toward quick, low‑risk ways to make money back then. That sharp lead for theft suggests policy focus should have prioritized social support and targeted prevention to address the root causes rather than only increasing enforcement.
YEAR 2001 HEATMAP
In [6]:
locations = list(zip(chicago_2001['Latitude'], chicago_2001['Longitude']))
m = folium.Map(location=[chicago_2001['Latitude'].mean(), chicago_2001['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_1.html')
m
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook